An Introduction to
Heuristic Optimization Techniques

Introduction

Imagine that you have a kind of "black box" which computes a mathematical function, but you do not know which. All you can do is feed the black box with an input number and receive the output. How can you find out what input you have to give to get a desired output, or at least an output close to the desired output?

The answer is simple: There is no other possibility than guessing. You have to feed the black box with various numbers and check what output you get. You decide yourself how many runs you do. After n runs, you check what input resulted in the output that came closest to the desired output. This is the solution for the problem. It may not be the optimal solution, and for this reason this solution is called heuristic.

With this simple example we have demonstrated what heuristic optimization is all about: finding a solution to a problem that is perhaps not the optimal solution, but comes as close as possible. Such approaches are used when it is, for some reason, not possible to compute an exact solution (as in the example: because the function is unknown and so we are unable to mathematically derive the inverse function) or when the computation of the exact solution is expensive (e.g. in NP-complete problems such as the traveling salesperson problem).

There are many different heuristic optimization techniques. It has been shown mathematically that, averaged over for all problems, these techniques have the same average run time. [1] However, for some problems particular techniques might be more efficient than others. It depends on the structure of the problem. If you have additional information on the nature of the problem, you can choose a technique that is more efficient than other techniques.

For instance, with the example problem: If you know that the black box function is a steady function (i.e. any section of the graph can be drawn with a pencil from one end to the other end without you ever having to lift the pencil) and you want to find the maximal output value, it makes sense to use linear interpolation. Instead of bombarding the function with random numbers, first you only choose two numbers x and y (where x < y) and compute f(x) and f(y). If f(y) > f(x), a good approach would be to choose a next number z > y and then compare f(y) with f(z), while if f(y) < f(x) you choose a number z < x and compare f(z) with f(x). This procedure is repeated until you either get the optimal result, or a chosen number of iterations has been made and you decide to stop.

With the particular problem (a black box function about which you know that it is STEADY), this approach is usually more efficient than random guessing. However, there is a snag to it which occurs with many heuristic optimization techniques: You might get stuck in local optima. Imagine you have a graph of a function that partly looks like a parabola, but has a steep slope far outside that parabola. If the first input number x is somewhere in that parabola, you will most likely find the maximum point of that parabola using linear interpolation, but you will probably not find the steep slope outside of that parabola which might contain points having even larger values. Due to this you will find a local optimum, but not the global optimum.

This problem can be solved by inserting randomization in the algorithm, thus combining the two techniques of random guessing and interpolation. Simply try some random, distant values x once after several steps of linear interpolation in order to see whether they yield better results.

Genetic Programming

Heuristic optimization may not only be used to compute input values of black box functions, but it can also be used to reverse-engineer these functions, generating functions closely resembling them. This can be done using a data structure such as a tree which models the function, and using heuristic optimization techniques to modify this function, coming closer and closer to the behaviour of the original (black box) function. A technique especially useful for this is genetic programming. Genetic programming was invented by John Koza [2] and belongs to the family of heuristic optimization techniques known as "evolutionary algorithms". These algorithms have in common that they are based on the biological concept of Darwinian evolution. At the beginning a "population" of several candidate solutions are generated. From these solutions the best few are selected and modified by means of various operations called "mutation". Together with a couple of new solutions created by means of "recombination" these modified solutions form the second population, and the entire process is repeated until a solution close enough to the desired one has been found. While the earlier evolutionary approaches focused on the evolution of parameters, in genetic programming the subjects of evolution are mathematical formulas and computer programs in general.

The following program, written in the D programming language [3], demonstrates how a tree describing a mathematical formula can be built and evaluated using a stack machine based on Reverse Polish Notation (RPN) [4] as used in old Hewlett-Packard pocket calculators.

/+  Claus-Dieter Volko
 +  cdvolko@gmail.com
 +/

import std.c.stdio;
import std.math;

enum Statement
{
    Add,
    Subtract,
    Multiply,
    Divide,
    Constant,
    Parameter,
    Function
}

class Tree
{
private:
    Statement statement = Statement.Constant;
    int value = 0;
    int function (int, int) func = null;
    Tree subtree1 = null;
    Tree subtree2 = null;
    Tree paternal = null;

public:
    public this ()
    {
    }

    public this (Tree p)
    {
        paternal = p;
    }

    public this (Statement s, int v, Tree t1, Tree t2)
    {
        statement = s;
        value = v;
        subtree1 = t1;
        subtree2 = t2;
    }

    public void setStatement (Statement s)
    {
        statement = s;
    }

    public void setValue (int v)
    {
        value = v;
    }

    public void setFunc (int function (int, int) f)
    {
        func = f;
    }

    public void setSubtree1 (Tree t)
    {
        subtree1 = t;
    }

    public void setSubtree2 (Tree t)
    {
        subtree2 = t;
    }

    public Tree getSubtree1 ()
    {
        return subtree1;
    }

    public Tree getSubtree2 ()
    {
        return subtree2;
    }

    public Tree getPaternal ()
    {
        return paternal;
    }

    public int evaluate (int [] parameters ...)
    {
        final switch (statement)
        {
        case Statement.Add:
            return subtree1.evaluate (parameters)
             + subtree2.evaluate (parameters);

        case Statement.Subtract:
            return subtree1.evaluate (parameters)
             - subtree2.evaluate (parameters);

        case Statement.Multiply:
            return subtree1.evaluate (parameters)
             * subtree2.evaluate (parameters);

        case Statement.Divide:
            return subtree1.evaluate (parameters)
             / subtree2.evaluate (parameters);

        case Statement.Parameter:
            return parameters [value];

        case Statement.Constant:
            return value;

        case Statement.Function:
            return func (subtree1.evaluate (parameters),
             subtree2.evaluate (parameters));
        }
    }

    public void print_inorder ()
    {
        final switch (statement)
        {
        case Statement.Add:
            printf ("(");
            subtree1.print_inorder ();
            printf (" + ");
            subtree2.print_inorder ();
            printf (")");
            break;

        case Statement.Subtract:
            printf ("(");
            subtree1.print_inorder ();
            printf (" - ");
            subtree2.print_inorder ();
            printf (")");
            break;

        case Statement.Multiply:
            printf ("(");
            subtree1.print_inorder ();
            printf (" * ");
            subtree2.print_inorder ();
            printf (")");
            break;

        case Statement.Divide:
            printf ("(");
            subtree1.print_inorder ();
            printf (" / ");
            subtree2.print_inorder ();
            printf (")");
            break;

        case Statement.Parameter:
            printf ("%c", value + 'a');
            break;

        case Statement.Constant:
            printf ("%d", value);
            break;

        case Statement.Function:
            printf ("function (");
            subtree1.print_inorder ();
            printf (", ");
            subtree2.print_inorder ();
            printf (")");
            break;
        }
    }
}

Tree stack_machine (Tree tree, ...)
{
    Tree currentTree = tree;
    int i;

    void nextpos_literal ()
    {
        if (currentTree.paternal !is null)
        {
            if (currentTree == currentTree.getPaternal ().getSubtree1 ())
                currentTree = currentTree.getPaternal ();
            if (currentTree.getPaternal () !is null && currentTree
             == currentTree.getPaternal ().getSubtree2 ())
            {
                currentTree = currentTree.getPaternal ();
                currentTree.setSubtree1 (new Tree (currentTree));
                currentTree = currentTree.getSubtree1 ();
            }
        }
    }

    void nextpos_statement ()
    {
        currentTree.setSubtree2 (new Tree (currentTree));
        currentTree = currentTree.getSubtree2 ();
    }

    for (i = 0; i < _arguments.length - 1; i++)
        if (_arguments [i] == typeid (char))
            _argptr += int.sizeof;
        else if (_arguments [i] == typeid (int))
            _argptr += int.sizeof;

    for (i = _arguments.length - 1; i >= 0; i--)
	{
        if (_arguments [i] == typeid (char))
        {
            switch (*cast (char *) _argptr)
            {
            case '+':
                currentTree.setStatement (Statement.Add);
                nextpos_statement ();
                break;

            case '-':
                currentTree.setStatement (Statement.Subtract);
                nextpos_statement ();
                break;

            case '*':
                currentTree.setStatement (Statement.Multiply);
                nextpos_statement ();
                break;

            case '/':
                currentTree.setStatement (Statement.Divide);
                nextpos_statement ();
                break;

            default:
                if ((*cast (char *) _argptr) >= 'a'
                 && (*cast (char *) _argptr) <= 'z')
                {
                    currentTree.setStatement (Statement.Parameter);
                    currentTree.setValue (*cast (char *) _argptr - 'a');
                    nextpos_literal ();
                }
                break;
            }
            _argptr -= int.sizeof;
        }
        else if (_arguments [i] == typeid (int))
        {
            currentTree.setStatement (Statement.Constant);
            currentTree.setValue (*cast (int *) _argptr);
            nextpos_literal ();
            _argptr -= int.sizeof;
        }
        else if (_arguments [i] == typeid (int function (int, int )))
        {
            currentTree.setStatement (Statement.Function);
            currentTree.setFunc (*cast (int function (int, int) *) _argptr);
            nextpos_statement ();
            _argptr -= (int function (int, int)).sizeof;
        }
    }

    return tree;
}

int power (int x, int y)
{
    return x^^y;
}

int main (string [] args)
{
    Tree tree = new Tree;

    void print_eval_tree (int e)
    {
        tree.print_inorder ();
        printf (" = %d\n", e);
    }

    print_eval_tree
     (stack_machine (tree, 2, 3, '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 3, 5, '+', '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 3, '+', 5, 4, '+', '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 3, 5, '+', '*').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 3, '*', 5, 4, '*', '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 3, '*', 5, 4, 3, '*', '+', '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 2, '+', 3, 6, '+', '*', 5, 4, 3, '*', '+',
     '+').evaluate ());
    print_eval_tree
     (stack_machine (tree, 2, 'a', '+').evaluate (3));
    print_eval_tree
     (stack_machine (tree, 2, 'a', '+').evaluate (4));
    print_eval_tree
     (stack_machine (tree, 'a', 2, '+').evaluate (5));
    print_eval_tree
     (stack_machine (tree, 'a', 'b', '+').evaluate (2, 6));
    print_eval_tree
     (stack_machine (tree, 'a', 'b', &power).evaluate (2, 6));
    print_eval_tree
     (stack_machine (tree, 8, 10, 2, '/', '-').evaluate ());
    return 0;
}

How can genetic programming be implemented with such a tree structure?

Mutation: Mutation can be the deletion of a (randomly chosen) subtree, the replacement of a literal with a new subtree or the replacement of an operation with another one.

Recombination: In order to recombine two trees, a position in the trees where they differ from each other must be found, and these two subtrees are then swapped.

More Traditional Approaches

Evolutionary algorithms can be combined with more traditional approaches to create more complex algorithms. Such combined algorithms are called memetic algorithms. Some traditional approaches to heuristic optimization are:

Local search: Start with a (random) solution and generate random "neighbor" solutions, that is solutions that are different from the original one in a defined manner (e.g. one parameter). There are various variants of local search: One is simply about generating a new neighbor solution and comparing it with the original solution; if the new solution is better, you discard the previous one and compute the next solution from the neighborhood of this solution. This strategy is called "next improvement". Another one is "best improvement": In this approach, you generate various neighbor solutions and choose the best.

Greedy algorithm: This may be useful for generating an initial solution that can be improved by means of local search. Choose a random value for one first parameter, then randomly choose a second parameter and try all possibilities until you find the best value. Proceed this way until the last parameter has been fixed. This approach can be used to gain a rather good initial solution for a problem involving several parameters with discrete values.

Greedy random adaptive search procedure (GRASP): This is a modification of the greedy algorithm in which not always the best possibility is chosen. From among a range of possibilities fulfilling a given criterion, a random one is chosen. This approach helps avoiding local optima.

Variable neighborhood descent (VND): There are many ways of defining neighborhoods. One neighborhood may be defined by the modification of one parameter. Another neighborhood may comprise solutions that differ from the original solution in two parameters, etc. VND is a modification of local search in which the neighborhood structure using which new solutions are generated is switched whenever a certain criterion is fulfilled (e.g., no reasonably good improvement for a couple of iterations).

Applications in Computer Graphics

There are as many applications of heuristic optimization techniques as there are not-easy-to-compute problems. In computer graphics, these techniques may be used to compute functions describing curves and other complex graphs.

Example Program

To demonstrate heuristic optimization, here is an example program written in C++ which solves the following problem: Given is a set of numbers (integers). You would like to distribute the elements of this set among a given number of subsets in such a way that the difference between the largest sum of the elements of a subset and the smallest sum is as little as possible (if possible zero). You can get a good initial solution using a greedy construction heuristics, and by exchanging elements between the subsets with the currently largest and smallest sums, you can gradually improve the solution.

When the program asks you to enter the number of subsets, try writing the same number as you entered for the number of elements. In this way you trigger a special use case that scans all subset sizes from 2 to n - 1 (n being the number of elements).

I deliberately left plenty of room for optimization. For instance, the function SortSubsets currently employes selection sort, which is one of the slowest known sorting algorithms.

/*  Claus-Dieter Volko
 *  cdvolko@gmail.com
 */

// Remove // to compile for (verbose) debug mode
// #define DEBUG

#include <stdio.h>
#include <conio.h>
#include <algorithm>

void Initialize (int *n, int *k, int *m, int **set, int ***subset, int **subset_n, 
                 int **subset_sum)
{
	int i;

	printf ("Number of elements: ");
	scanf ("%d", n);
	*set = new int [*n];
	printf ("Number of subsets: ");
	scanf ("%d", k);
	*subset = new int * [*k];
	*subset_n = new int [*k];
	*subset_sum = new int [*k];
	for (i = 0; i < *k; i++)
	{
		(*subset) [i] = new int [*n - 1];
		(*subset_n) [i] = 0;
		(*subset_sum) [i] = 0;
	}
	for (i = 0; i < *n; i++)
	{
		printf ("Enter element %d: ", i);
		scanf ("%d", &((*set) [i]));
	}
	*m = 0;
	for (i = 0; i < *n; i++)
		*m += (*set) [i];
	*m /= *k;
	std::sort (*set, *set + *n);
}

void Delete (int k, int *set, int **subset)
{
	int i;

	delete [] set;
	for (i = 0; i < k; i++)
		delete [] subset [i];
	delete [] subset;
}

inline void PrintSet (int n, int *set)
{
	int i;

	for (i = 0; i < n; i++)
		printf ("%d ", set [i]);

	printf ("\n");
}

void SortSubsets (int k, int **subset, int *subset_n, int *subset_sum)
{
	int i, j, best, *ptmp, tmp;

	for (i = 0; i < k; i++)
	{
		for (j = i, best = i; j < k; j++)
			if (subset_sum [j] < subset_sum [best])
				best = j;

		if (best != i)
		{
			ptmp = subset [i];
			subset [i] = subset [best];
			subset [best] = ptmp;
			tmp = subset_n [i];
			subset_n [i] = subset_n [best];
			subset_n [best] = tmp;
			tmp = subset_sum [i];
			subset_sum [i] = subset_sum [best];
			subset_sum [best] = tmp;
		}
	}
}

void GreedyConstruction (int n, int k, int m, int *set, int **subset, int *subset_n, 
                         int *subset_sum)
{
	int i, j;

	for (i = n; i > n - k; i--)
	{
		*(subset [n - i] + subset_n [n - i]) = set [i - 1];
		subset_n [n - i]++;
		subset_sum [n - i] += set [i - 1];
	}

	while (i + 1)
	{
		for (j = 0; j < k && subset_sum [j] + set [i] > m; j++);
		if (j == k) 
			j--;
		*(subset [j] + subset_n [j]) = set [i];
		subset_n [j]++;
		subset_sum [j] += set [i];
		i--;
	}

	for (i = 0; i < k; i++)
		std::sort (subset [i], subset [i] + subset_n [i]);

	SortSubsets (k, subset, subset_n, subset_sum);
}

void PrintSets (int k, int **subset, int *subset_n)
{
	int i;

	for (i = 0; i < k; i++)
	{
		printf ("Subset %d: ", i);
		PrintSet (subset_n [i], subset [i]);
	}
}

int FindClosestElement (int d, int *set, int n)
{
	int top = 0, bottom = n - 1, current = -1, previous;
	
	do
	{
		previous = current;
		current = (bottom + top) / 2;
		if (set [current] == d)
			return current;
		else if (set [current] < d)
			top = current;
		else
			bottom = current;
	} while (previous != current);

	return current;
}

void InsertElement (int element, int subset_index, int **subset, int *subset_n, int *subset_sum)
{
	subset [subset_index] [subset_n [subset_index]] = element;
	subset_n [subset_index]++;
	subset_sum [subset_index] += element;
	std::sort (subset [subset_index], subset [subset_index] + subset_n [subset_index]);
}

void RemoveElement (int index_x, int subset_index, int **subset, int *subset_n, int *subset_sum)
{
	int i;

	subset_sum [subset_index] -= subset [subset_index] [index_x];

	for (i = index_x + 1; i < subset_n [subset_index]; i++)
		subset [subset_index] [i - 1] = subset [subset_index] [i];

	subset_n [subset_index]--;
}

void Optimize (int k, int m, int **subset, int *subset_n, int *subset_sum)
{
	bool exchange;
	int index_x, d;

	do
	{
		d = subset_sum [k - 1] - subset_sum [0];
		index_x = FindClosestElement (d / 2, subset [k - 1], subset_n [k - 1]);
		if (subset [k - 1] [index_x] < d)
		{
			InsertElement (subset [k - 1] [index_x], 0, subset, subset_n, subset_sum);
			RemoveElement (index_x, k - 1, subset, subset_n, subset_sum);
			SortSubsets (k, subset, subset_n, subset_sum);
			exchange = true;
		}
		else
			exchange = false;

#ifdef DEBUG
		printf ("\nCurrent solution:\n");
		PrintSets (k, subset, subset_n);
		getch ();
#endif
	} while (exchange);
}

int CalculateFitness (int k, int *subset_sum)
{
	return subset_sum [k - 1] - subset_sum [0];
}

void Reset (int k, int *subset_n, int *subset_sum)
{
	int i;

	for (i = 0; i < k; i++)
	{
		subset_n [i] = 0;
		subset_sum [i] = 0;
	}
}

void main ()
{
	int n, k, m, *set, **subset, *subset_n, *subset_sum, i;

	Initialize (&n, &k, &m, &set, &subset, &subset_n, &subset_sum);

	if (n == k)
	{
		k--;
		i = 2;
	}
	else
		i = k;

	for (; i <= k; i++)
	{
		Reset (k, subset_n, subset_sum);

		GreedyConstruction (n, i, m, set, subset, subset_n, subset_sum);

#ifdef DEBUG
		printf ("\nInitial solution for %d subsets after greedy construction"
                        " heuristics:\n", i);
		PrintSets (i, subset, subset_n);
		printf ("Fitness of this solution: %d\n", CalculateFitness (i, subset_sum));
		getch ();
#endif

		Optimize (i, m, subset, subset_n, subset_sum);

		printf ("\nFinal solution for %d subsets after optimization:\n", i);
		PrintSets (i, subset, subset_n);
		printf ("Fitness of this solution: %d\n", CalculateFitness (i, subset_sum));
		getch ();
	}

	Delete (k, set, subset);
}

Links related to this article

[1] No free lunch in search and optimization
[2] John Koza
[3] D (programming language)
[4] Reverse Polish notation

Adok